k-mw-modes: An algorithm for clustering categorical matrix-object data

نویسندگان

  • Fuyuan Cao
  • Liqin Yu
  • Joshua Zhexue Huang
  • Jiye Liang
چکیده

In data mining, the input of most algorithms is a set of n objects and each object is described by a feature vector. However, in many real database applications, an object is described by more than one feature vector. In this paper, we call an object described by more than one feature vector as a matrix-object and a data set consisting of matrix-objects as a matrix-object data set. We propose a k-multi-weighted-modes eywords: ategorical data atrix-object -mw-modes algorithm (abbr. k-mw-modes) algorithm for clustering categorical matrix-object data. In this algorithm, we define the distance between two categorical matrix-objects and a multi-weighted-modes representation of cluster prototypes is proposed. We give a heuristic method to choose the locally optimal multi-weightedmodes in the iteration of the k-mw-modes algorithm. We validated the effectiveness and benefits of the k-mw-modes algorithm on the five real data sets from different applications. © 2017 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

3D Object Retrieval Based on PSO-K-Modes Method

By use of semantic attributes of 3D object, the user can search for targeted objects, which main advantage is that it does not require the user to sketch a 3D object as the query for 3D object retrieval, and the retrieval system can obtain a better retrieval performance. There are many categorical datum among these attributes, and how to use those and find the most similar objects is a vital pr...

متن کامل

Genetic Distance Measure for K-modes Algorithm

K-means algorithm has been shown to be an effective and efficient algorithm for clustering. However, the k-means algorithm is developed for numerical data only. It is not suitable for the clustering of non-numerical data. K-modes algorithm has been developed for clustering categorical objects by extending from the k-means algorithm. However, no one applies this technique for classification of c...

متن کامل

Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode

The original k-means algorithm is designed to work primarily on numeric data sets. This prohibits the algorithm from being applied to categorical data clustering, which is an integral part of data mining and has attracted much attention recently. The k-modes algorithm extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus t...

متن کامل

Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Appl. Soft Comput.

دوره 57  شماره 

صفحات  -

تاریخ انتشار 2017